Skip to content

Serialize object fix #11349

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Conversation

iluuu1994
Copy link
Member

@iluuu1994 iluuu1994 commented May 30, 2023

Adapted from #11305.

The tracking of RC1 arrays was straight-forward. The most common case is likely passing a flat array of objects to serialize. Unfortunately, this will still result in an array with RC2 if the array is stored in a CV, because ZEND_SEND_VAR[_EX] increases the refcount. I'm trying to look for the corresponding ZEND_SEND_VAR opcode in the parent call frame. At least in terms of the synthetic benchmark the change seems worth it.

Benchmark

<?php

class Foo {}

$objects = [];
for ($i = 0; $i < 10000; $i++) {
    $objects[] = new Foo();
}

for ($i = 0; $i < 1000; $i++) {
    serialize($objects);
}

?>
Before this PR:
  Time (mean ± σ):     168.7 ms ±   2.2 ms    [User: 166.0 ms, System: 2.2 ms]
  Range (min … max):   164.8 ms … 172.5 ms    17 runs

Before unique:
  Time (mean ± σ):     599.7 ms ±   7.0 ms    [User: 418.0 ms, System: 179.1 ms]
  Range (min … max):   586.0 ms … 608.4 ms    10 runs

After unique:
  Time (mean ± σ):     166.1 ms ±   1.9 ms    [User: 163.4 ms, System: 2.1 ms]
  Range (min … max):   163.7 ms … 168.9 ms    17 runs

Patch

From 718dae4761de8a4d411f6466fca4b4b89c8572f4 Mon Sep 17 00:00:00 2001
From: Ilija Tovilo <ilija.tovilo@me.com>
Date: Tue, 30 May 2023 14:59:25 +0200
Subject: [PATCH] Skip serialize object recording when array paths are rc1

Additionally, handle the common case of passing a CV-array with RC1 to
serialize. This results in RC2 which we can safely treat as RC1 because no
additional internal references are possible.
---
 .../serialize/serialization_objects_019.phpt  | 33 +++++++---
 ext/standard/var.c                            | 66 ++++++++++++++-----
 2 files changed, 74 insertions(+), 25 deletions(-)

diff --git a/ext/standard/tests/serialize/serialization_objects_019.phpt b/ext/standard/tests/serialize/serialization_objects_019.phpt
index 505fc3d9af..916a899b17 100644
--- a/ext/standard/tests/serialize/serialization_objects_019.phpt
+++ b/ext/standard/tests/serialize/serialization_objects_019.phpt
@@ -2,16 +2,29 @@
 Object serialization with references
 --FILE--
 <?php
-function gen() {
-    $s = new stdClass;
-    $r = new stdClass;
-    $r->a = [$s];
-    $r->b = $r->a;
-    return $r;
+
+function rcn() {
+    $root = new stdClass;
+    $end = new stdClass;
+    $root->a = [$end];
+    $root->b = $root->a;
+    unset($end);
+    var_dump(serialize($root));
 }
-var_dump(serialize(gen()));
-?>
---EXPECTF--
-string(78) "O:8:"stdClass":2:{s:1:"a";a:1:{i:0;O:8:"stdClass":0:{}}s:1:"b";a:1:{i:0;r:3;}}"
 
+function rcn_rc1() {
+    $root = new stdClass;
+    $end = new stdClass;
+    $root->a = [[$end]];
+    $root->b = $root->a;
+    unset($end);
+    var_dump(serialize($root));
+}
 
+rcn();
+rcn_rc1();
+
+?>
+--EXPECT--
+string(78) "O:8:"stdClass":2:{s:1:"a";a:1:{i:0;O:8:"stdClass":0:{}}s:1:"b";a:1:{i:0;r:3;}}"
+string(98) "O:8:"stdClass":2:{s:1:"a";a:1:{i:0;a:1:{i:0;O:8:"stdClass":0:{}}}s:1:"b";a:1:{i:0;a:1:{i:0;r:4;}}}"
diff --git a/ext/standard/var.c b/ext/standard/var.c
index 641e0275f5..08a9996472 100644
--- a/ext/standard/var.c
+++ b/ext/standard/var.c
@@ -652,9 +652,13 @@ PHP_FUNCTION(var_export)
 }
 /* }}} */
 
-static void php_var_serialize_intern(smart_str *buf, zval *struc, php_serialize_data_t var_hash);
+static void php_var_serialize_intern(smart_str *buf, zval *struc, php_serialize_data_t var_hash, bool unique, bool subtract_rc);
 
-static inline zend_long php_add_var_hash(php_serialize_data_t data, zval *var) /* {{{ */
+/**
+ * @param bool unique Whether the element can only appear once in the serialized data. It can appear
+ *                    multiple times if it is embedded in an array with RC > 1.
+ */
+static inline zend_long php_add_var_hash(php_serialize_data_t data, zval *var, bool unique) /* {{{ */
 {
 	zval *zv;
 	zend_ulong key;
@@ -666,6 +670,10 @@ static inline zend_long php_add_var_hash(php_serialize_data_t data, zval *var) /
 		/* pass */
 	} else if (Z_TYPE_P(var) != IS_OBJECT) {
 		return 0;
+	} else if (unique
+	 && Z_REFCOUNT_P(var) == 1
+	 && (Z_OBJ_P(var)->properties == NULL || GC_REFCOUNT(Z_OBJ_P(var)->properties) == 1)) {
+		return 0;
 	}
 
 	/* References to objects are treated as if the reference didn't exist */
@@ -921,7 +929,7 @@ static int php_var_serialize_get_sleep_props(
 }
 /* }}} */
 
-static void php_var_serialize_nested_data(smart_str *buf, zval *struc, HashTable *ht, uint32_t count, bool incomplete_class, php_serialize_data_t var_hash) /* {{{ */
+static void php_var_serialize_nested_data(smart_str *buf, zval *struc, HashTable *ht, uint32_t count, bool incomplete_class, php_serialize_data_t var_hash, bool unique) /* {{{ */
 {
 	smart_str_append_unsigned(buf, count);
 	smart_str_appendl(buf, ":{", 2);
@@ -951,19 +959,19 @@ static void php_var_serialize_nested_data(smart_str *buf, zval *struc, HashTable
 			if (Z_TYPE_P(data) == IS_ARRAY) {
 				if (UNEXPECTED(Z_IS_RECURSIVE_P(data))
 					|| UNEXPECTED(Z_TYPE_P(struc) == IS_ARRAY && Z_ARR_P(data) == Z_ARR_P(struc))) {
-					php_add_var_hash(var_hash, struc);
+					php_add_var_hash(var_hash, struc, unique);
 					smart_str_appendl(buf, "N;", 2);
 				} else {
 					if (Z_REFCOUNTED_P(data)) {
 						Z_PROTECT_RECURSION_P(data);
 					}
-					php_var_serialize_intern(buf, data, var_hash);
+					php_var_serialize_intern(buf, data, var_hash, unique, false);
 					if (Z_REFCOUNTED_P(data)) {
 						Z_UNPROTECT_RECURSION_P(data);
 					}
 				}
 			} else {
-				php_var_serialize_intern(buf, data, var_hash);
+				php_var_serialize_intern(buf, data, var_hash, unique, false);
 			}
 		} ZEND_HASH_FOREACH_END();
 	}
@@ -978,13 +986,13 @@ static void php_var_serialize_class(smart_str *buf, zval *struc, HashTable *ht,
 	if (php_var_serialize_get_sleep_props(&props, struc, ht) == SUCCESS) {
 		php_var_serialize_class_name(buf, struc);
 		php_var_serialize_nested_data(
-			buf, struc, &props, zend_hash_num_elements(&props), /* incomplete_class */ 0, var_hash);
+			buf, struc, &props, zend_hash_num_elements(&props), /* incomplete_class */ 0, var_hash, GC_REFCOUNT(&props) == 1);
 	}
 	zend_hash_destroy(&props);
 }
 /* }}} */
 
-static void php_var_serialize_intern(smart_str *buf, zval *struc, php_serialize_data_t var_hash) /* {{{ */
+static void php_var_serialize_intern(smart_str *buf, zval *struc, php_serialize_data_t var_hash, bool unique, bool subtract_rc) /* {{{ */
 {
 	zend_long var_already;
 	HashTable *myht;
@@ -993,7 +1001,7 @@ static void php_var_serialize_intern(smart_str *buf, zval *struc, php_serialize_
 		return;
 	}
 
-	if (var_hash && (var_already = php_add_var_hash(var_hash, struc))) {
+	if (var_hash && (var_already = php_add_var_hash(var_hash, struc, unique))) {
 		if (var_already == -1) {
 			/* Reference to an object that failed to serialize, replace with null. */
 			smart_str_appendl(buf, "N;", 2);
@@ -1102,7 +1110,7 @@ again:
 						if (Z_ISREF_P(data) && Z_REFCOUNT_P(data) == 1) {
 							data = Z_REFVAL_P(data);
 						}
-						php_var_serialize_intern(buf, data, var_hash);
+						php_var_serialize_intern(buf, data, var_hash, Z_REFCOUNT_P(&retval) == 1, false);
 					} ZEND_HASH_FOREACH_END();
 					smart_str_appendc(buf, '}');
 
@@ -1224,7 +1232,7 @@ again:
 								prop = Z_REFVAL_P(prop);
 							}
 
-							php_var_serialize_intern(buf, prop, var_hash);
+							php_var_serialize_intern(buf, prop, var_hash, true, false);
 						}
 						smart_str_appendc(buf, '}');
 					} else {
@@ -1239,7 +1247,7 @@ again:
 				if (count > 0 && incomplete_class) {
 					--count;
 				}
-				php_var_serialize_nested_data(buf, struc, myht, count, incomplete_class, var_hash);
+				php_var_serialize_nested_data(buf, struc, myht, count, incomplete_class, var_hash, GC_REFCOUNT(myht) == 1);
 				zend_release_properties(myht);
 				return;
 			}
@@ -1247,7 +1255,7 @@ again:
 			smart_str_appendl(buf, "a:", 2);
 			myht = Z_ARRVAL_P(struc);
 			php_var_serialize_nested_data(
-				buf, struc, myht, zend_array_count(myht), /* incomplete_class */ 0, var_hash);
+				buf, struc, myht, zend_array_count(myht), /* incomplete_class */ 0, var_hash, unique && (GC_REFCOUNT(myht) - (subtract_rc ? 1 : 0)) == 1);
 			return;
 		case IS_REFERENCE:
 			struc = Z_REFVAL_P(struc);
@@ -1261,11 +1269,17 @@ again:
 
 PHPAPI void php_var_serialize(smart_str *buf, zval *struc, php_serialize_data_t *data) /* {{{ */
 {
-	php_var_serialize_intern(buf, struc, *data);
+	php_var_serialize_intern(buf, struc, *data, true, false);
 	smart_str_0(buf);
 }
 /* }}} */
 
+static void php_var_serialize_cv(smart_str *buf, zval *struc, php_serialize_data_t *data)
+{
+	php_var_serialize_intern(buf, struc, *data, true, true);
+	smart_str_0(buf);
+}
+
 PHPAPI php_serialize_data_t php_var_serialize_init(void) {
 	struct php_serialize_data *d;
 	/* fprintf(stderr, "SERIALIZE_INIT      == lock: %u, level: %u\n", BG(serialize_lock), BG(serialize).level); */
@@ -1306,8 +1320,30 @@ PHP_FUNCTION(serialize)
 		Z_PARAM_ZVAL(struc)
 	ZEND_PARSE_PARAMETERS_END();
 
+	bool data_is_cv = false;
+	if (Z_TYPE_P(struc) == IS_ARRAY
+	 && !(GC_FLAGS(Z_ARRVAL_P(struc)) & IS_ARRAY_IMMUTABLE)
+	 && EG(current_execute_data)
+	 && EG(current_execute_data)->prev_execute_data) {
+		zend_execute_data *execute_data = EG(current_execute_data)->prev_execute_data;
+		if (execute_data->func && ZEND_USER_CODE(execute_data->func->type)) {
+			zend_op_array *func = &execute_data->func->op_array;
+			const zend_op *opline = execute_data->opline;
+			if (func->opcodes < opline) {
+				const zend_op *prev_opline = opline - 1;
+				if ((prev_opline->opcode == ZEND_SEND_VAR || prev_opline->opcode == ZEND_SEND_VAR_EX) && prev_opline->op1_type == IS_CV) {
+					data_is_cv = true;
+				}
+			}
+		}
+	}
+
 	PHP_VAR_SERIALIZE_INIT(var_hash);
-	php_var_serialize(&buf, struc, &var_hash);
+	if (data_is_cv) {
+		php_var_serialize_cv(&buf, struc, &var_hash);
+	} else {
+		php_var_serialize(&buf, struc, &var_hash);
+	}
 	PHP_VAR_SERIALIZE_DESTROY(var_hash);
 
 	if (EG(exception)) {
-- 
2.40.1

The removed if in var.c caused serialize to not handle object references
correctly under certain circumstances.

See tests/serialize/serialization_objects_019.phpt

The bug was originally introduced in commit 6c5942f, and the problematic
line was last modified in commit bb0b4eb. (Fixes oss-fuzz #44954)

The testcase from bb0b4eb still passes.
@iluuu1994 iluuu1994 requested a review from dstogov May 30, 2023 17:49
Copy link
Member

@dstogov dstogov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to play with the patch.

Copy link
Member

@dstogov dstogov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The memory error should be fixed first.

@dstogov
Copy link
Member

dstogov commented Jun 5, 2023

I propose the following patch on top of your PR

diff --git a/ext/standard/var.c b/ext/standard/var.c
index 08a9996472..303e77b850 100644
--- a/ext/standard/var.c
+++ b/ext/standard/var.c
@@ -652,13 +652,12 @@ PHP_FUNCTION(var_export)
 }
 /* }}} */
 
-static void php_var_serialize_intern(smart_str *buf, zval *struc, php_serialize_data_t var_hash, bool unique, bool subtract_rc);
+static void php_var_serialize_intern(smart_str *buf, zval *struc, php_serialize_data_t var_hash, bool in_rcn_array, bool is_root);
 
 /**
- * @param bool unique Whether the element can only appear once in the serialized data. It can appear
- *                    multiple times if it is embedded in an array with RC > 1.
+ * @param bool in_rcn_array Whether the element appears in an array with RC > 1.
  */
-static inline zend_long php_add_var_hash(php_serialize_data_t data, zval *var, bool unique) /* {{{ */
+static inline zend_long php_add_var_hash(php_serialize_data_t data, zval *var, bool in_rcn_array) /* {{{ */
 {
 	zval *zv;
 	zend_ulong key;
@@ -670,7 +669,7 @@ static inline zend_long php_add_var_hash(php_serialize_data_t data, zval *var, b
 		/* pass */
 	} else if (Z_TYPE_P(var) != IS_OBJECT) {
 		return 0;
-	} else if (unique
+	} else if (!in_rcn_array
 	 && Z_REFCOUNT_P(var) == 1
 	 && (Z_OBJ_P(var)->properties == NULL || GC_REFCOUNT(Z_OBJ_P(var)->properties) == 1)) {
 		return 0;
@@ -929,7 +928,7 @@ static int php_var_serialize_get_sleep_props(
 }
 /* }}} */
 
-static void php_var_serialize_nested_data(smart_str *buf, zval *struc, HashTable *ht, uint32_t count, bool incomplete_class, php_serialize_data_t var_hash, bool unique) /* {{{ */
+static void php_var_serialize_nested_data(smart_str *buf, zval *struc, HashTable *ht, uint32_t count, bool incomplete_class, php_serialize_data_t var_hash, bool in_rcn_array) /* {{{ */
 {
 	smart_str_append_unsigned(buf, count);
 	smart_str_appendl(buf, ":{", 2);
@@ -959,19 +958,19 @@ static void php_var_serialize_nested_data(smart_str *buf, zval *struc, HashTable
 			if (Z_TYPE_P(data) == IS_ARRAY) {
 				if (UNEXPECTED(Z_IS_RECURSIVE_P(data))
 					|| UNEXPECTED(Z_TYPE_P(struc) == IS_ARRAY && Z_ARR_P(data) == Z_ARR_P(struc))) {
-					php_add_var_hash(var_hash, struc, unique);
+					php_add_var_hash(var_hash, struc, in_rcn_array);
 					smart_str_appendl(buf, "N;", 2);
 				} else {
 					if (Z_REFCOUNTED_P(data)) {
 						Z_PROTECT_RECURSION_P(data);
 					}
-					php_var_serialize_intern(buf, data, var_hash, unique, false);
+					php_var_serialize_intern(buf, data, var_hash, in_rcn_array, false);
 					if (Z_REFCOUNTED_P(data)) {
 						Z_UNPROTECT_RECURSION_P(data);
 					}
 				}
 			} else {
-				php_var_serialize_intern(buf, data, var_hash, unique, false);
+				php_var_serialize_intern(buf, data, var_hash, in_rcn_array, false);
 			}
 		} ZEND_HASH_FOREACH_END();
 	}
@@ -986,13 +985,13 @@ static void php_var_serialize_class(smart_str *buf, zval *struc, HashTable *ht,
 	if (php_var_serialize_get_sleep_props(&props, struc, ht) == SUCCESS) {
 		php_var_serialize_class_name(buf, struc);
 		php_var_serialize_nested_data(
-			buf, struc, &props, zend_hash_num_elements(&props), /* incomplete_class */ 0, var_hash, GC_REFCOUNT(&props) == 1);
+			buf, struc, &props, zend_hash_num_elements(&props), /* incomplete_class */ 0, var_hash, GC_REFCOUNT(&props) > 1);
 	}
 	zend_hash_destroy(&props);
 }
 /* }}} */
 
-static void php_var_serialize_intern(smart_str *buf, zval *struc, php_serialize_data_t var_hash, bool unique, bool subtract_rc) /* {{{ */
+static void php_var_serialize_intern(smart_str *buf, zval *struc, php_serialize_data_t var_hash, bool in_rcn_array, bool is_root) /* {{{ */
 {
 	zend_long var_already;
 	HashTable *myht;
@@ -1001,7 +1000,7 @@ static void php_var_serialize_intern(smart_str *buf, zval *struc, php_serialize_
 		return;
 	}
 
-	if (var_hash && (var_already = php_add_var_hash(var_hash, struc, unique))) {
+	if (var_hash && (var_already = php_add_var_hash(var_hash, struc, in_rcn_array))) {
 		if (var_already == -1) {
 			/* Reference to an object that failed to serialize, replace with null. */
 			smart_str_appendl(buf, "N;", 2);
@@ -1110,7 +1109,7 @@ again:
 						if (Z_ISREF_P(data) && Z_REFCOUNT_P(data) == 1) {
 							data = Z_REFVAL_P(data);
 						}
-						php_var_serialize_intern(buf, data, var_hash, Z_REFCOUNT_P(&retval) == 1, false);
+						php_var_serialize_intern(buf, data, var_hash, Z_REFCOUNT_P(&retval) > 1, false);
 					} ZEND_HASH_FOREACH_END();
 					smart_str_appendc(buf, '}');
 
@@ -1232,7 +1231,7 @@ again:
 								prop = Z_REFVAL_P(prop);
 							}
 
-							php_var_serialize_intern(buf, prop, var_hash, true, false);
+							php_var_serialize_intern(buf, prop, var_hash, false, false);
 						}
 						smart_str_appendc(buf, '}');
 					} else {
@@ -1247,7 +1246,7 @@ again:
 				if (count > 0 && incomplete_class) {
 					--count;
 				}
-				php_var_serialize_nested_data(buf, struc, myht, count, incomplete_class, var_hash, GC_REFCOUNT(myht) == 1);
+				php_var_serialize_nested_data(buf, struc, myht, count, incomplete_class, var_hash, GC_REFCOUNT(myht) > 1);
 				zend_release_properties(myht);
 				return;
 			}
@@ -1255,7 +1254,8 @@ again:
 			smart_str_appendl(buf, "a:", 2);
 			myht = Z_ARRVAL_P(struc);
 			php_var_serialize_nested_data(
-				buf, struc, myht, zend_array_count(myht), /* incomplete_class */ 0, var_hash, unique && (GC_REFCOUNT(myht) - (subtract_rc ? 1 : 0)) == 1);
+				buf, struc, myht, zend_array_count(myht), /* incomplete_class */ 0, var_hash,
+					!is_root && (in_rcn_array || GC_REFCOUNT(myht) > 1));
 			return;
 		case IS_REFERENCE:
 			struc = Z_REFVAL_P(struc);
@@ -1269,17 +1269,11 @@ again:
 
 PHPAPI void php_var_serialize(smart_str *buf, zval *struc, php_serialize_data_t *data) /* {{{ */
 {
-	php_var_serialize_intern(buf, struc, *data, true, false);
+	php_var_serialize_intern(buf, struc, *data, false, true);
 	smart_str_0(buf);
 }
 /* }}} */
 
-static void php_var_serialize_cv(smart_str *buf, zval *struc, php_serialize_data_t *data)
-{
-	php_var_serialize_intern(buf, struc, *data, true, true);
-	smart_str_0(buf);
-}
-
 PHPAPI php_serialize_data_t php_var_serialize_init(void) {
 	struct php_serialize_data *d;
 	/* fprintf(stderr, "SERIALIZE_INIT      == lock: %u, level: %u\n", BG(serialize_lock), BG(serialize).level); */
@@ -1320,30 +1314,8 @@ PHP_FUNCTION(serialize)
 		Z_PARAM_ZVAL(struc)
 	ZEND_PARSE_PARAMETERS_END();
 
-	bool data_is_cv = false;
-	if (Z_TYPE_P(struc) == IS_ARRAY
-	 && !(GC_FLAGS(Z_ARRVAL_P(struc)) & IS_ARRAY_IMMUTABLE)
-	 && EG(current_execute_data)
-	 && EG(current_execute_data)->prev_execute_data) {
-		zend_execute_data *execute_data = EG(current_execute_data)->prev_execute_data;
-		if (execute_data->func && ZEND_USER_CODE(execute_data->func->type)) {
-			zend_op_array *func = &execute_data->func->op_array;
-			const zend_op *opline = execute_data->opline;
-			if (func->opcodes < opline) {
-				const zend_op *prev_opline = opline - 1;
-				if ((prev_opline->opcode == ZEND_SEND_VAR || prev_opline->opcode == ZEND_SEND_VAR_EX) && prev_opline->op1_type == IS_CV) {
-					data_is_cv = true;
-				}
-			}
-		}
-	}
-
 	PHP_VAR_SERIALIZE_INIT(var_hash);
-	if (data_is_cv) {
-		php_var_serialize_cv(&buf, struc, &var_hash);
-	} else {
-		php_var_serialize(&buf, struc, &var_hash);
-	}
+	php_var_serialize(&buf, struc, &var_hash);
 	PHP_VAR_SERIALIZE_DESTROY(var_hash);
 
 	if (EG(exception)) {

@iluuu1994
Copy link
Member Author

@dstogov Much better naming, thank you!

@iluuu1994 iluuu1994 force-pushed the serialize_object_fix branch from 718dae4 to e374a03 Compare June 5, 2023 20:30
Copy link
Member

@dstogov dstogov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The final patch definitely looks better. I think it's OK now.

@nielsdos can you please make a final review

@nielsdos
Copy link
Member

nielsdos commented Jun 6, 2023

The high-level idea makes sense. However I found a variation of the original code that still doesn't work. I only had a brief look. I didn't have time yet to do more testing or debugging to find the root cause. May be the root cause is different.

<?php
function test() {
    $root = new stdClass;
    $end = new stdClass;
    $root->a = [$end];
    $end->c = [$root->a, $root->a];
    return $root;
}

var_dump(unserialize($s = serialize(test())));
var_dump(test());
var_dump($s);

This gives this output:

object(stdClass)#1 (1) {
  ["a"]=>
  array(1) {
    [0]=>
    object(stdClass)#3 (1) {
      ["c"]=>
      array(2) {
        [0]=>
        NULL
        [1]=>
        NULL
      }
    }
  }
}
object(stdClass)#1 (1) {
  ["a"]=>
  array(1) {
    [0]=>
    object(stdClass)#3 (1) {
      ["c"]=>
      array(2) {
        [0]=>
        *RECURSION*
        [1]=>
        *RECURSION*
      }
    }
  }
}
string(82) "O:8:"stdClass":1:{s:1:"a";a:1:{i:0;O:8:"stdClass":1:{s:1:"c";a:2:{i:0;N;i:1;N;}}}}"

The first var_dump shows the unserialisation after it was serialised, and the second dump shows what I expected to see. The entries of the c array are NULL.
In the last line we see the result from serialize(). We see that $end->c is wrongly encoded with two NULL elements.
This problem reproduces on current master, with the old PR and with this PR.

@iluuu1994
Copy link
Member Author

@nielsdos Thank you for checking! Your test case seems to fail here:

php-src/ext/standard/var.c

Lines 942 to 945 in a02f7f2

if (UNEXPECTED(Z_IS_RECURSIVE_P(data))
|| UNEXPECTED(Z_TYPE_P(struc) == IS_ARRAY && Z_ARR_P(data) == Z_ARR_P(struc))) {
php_add_var_hash(var_hash, struc);
smart_str_appendl(buf, "N;", 2);

It seems that non-referencial recursive arrays are not supported. Instead, we just replace them with NULL when encountering them. I think that's because we don't actually record them and thus don't know which element to reference. Either way, I think this is an unrelated issue.

@nielsdos
Copy link
Member

nielsdos commented Jun 6, 2023

Okay this looks good then.
Other than the pre-existing problem I didn't find any new.

@iluuu1994
Copy link
Member Author

@nielsdos Thank you!

@iluuu1994
Copy link
Member Author

I discovered another problematic case.

$root = [];
$root[] = new stdClass;
$root[] = &$root;
echo serialize($root), "\n";

a:2:{i:0;O:8:"stdClass":0:{}i:1;a:2:{i:0;O:8:"stdClass":0:{}i:1;R:3;}}

The issue here is that we're protecting nested arrays but not the root array. This can be solved by moving the array protection/recursion check from php_var_serialize_nested_data to php_var_serialize_intern.

php-src/ext/standard/var.c

Lines 954 to 966 in 4fcb3e0

if (UNEXPECTED(Z_IS_RECURSIVE_P(data))
|| UNEXPECTED(Z_TYPE_P(struc) == IS_ARRAY && Z_ARR_P(data) == Z_ARR_P(struc))) {
php_add_var_hash(var_hash, struc);
smart_str_appendl(buf, "N;", 2);
} else {
if (Z_REFCOUNTED_P(data)) {
Z_PROTECT_RECURSION_P(data);
}
php_var_serialize_intern(buf, data, var_hash);
if (Z_REFCOUNTED_P(data)) {
Z_UNPROTECT_RECURSION_P(data);
}
}

Unfortunately, there's another issue. I noticed that there are existing tests that look off.

  • ext/standard/tests/serialize/serialization_arrays_001.phpt
  • ext/standard/tests/serialize/serialization_arrays_004.phpt
  • ext/standard/tests/serialize/serialization_arrays_005.phpt

Specifically, all these tests contain a simple, self-recursive array like a -> a. However, since serialize accepts the $value parameter by-value, the root array is not stored in php_serialize_data_t.ht. Without this patch, the recursive array is visited again, and added to php_serialize_data_t.ht this time because it is wrapped in a reference. On the third pass, the reference is replaced with a R:n; placeholder instead of visiting the element again. This leads to the following object graph: a -> a' -> a'. This is clearly wrong.

With the proposed solution, we map flag the root array as visited right away. The second encounter fails because the array is already flagged as visited, and thus replaced with N;. So the result is a -> null. This is also wrong. I'm not sure if this cannot trivially be solved without passing the $value by-reference. So I suppose the question is which one is more wrong.

Tbh, I consider recursive arrays mostly useless and would prefer if PHP didn't support them at all. I'm happy completely ignoring this use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants